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The complexity of surgical procedures often poses chal- 
lenges for conducting a rigorous and comprehensive evalu- 
ation. This paper considers the final two IDEAL stages of 
surgical innovation. Surgical randomised controlled tri- 
als are often challenging to undertake and require careful 
consideration of the intervention definition, who should 
deliver it, and the impact of surgeon and patient prefer- 
ences. In the long term study stage, better monitoring of 
surgical procedures is needed, along with improved sur- 
veillance of devices. 

Introduction 

The IDEAL framework describes the stages through which 
interventional therapy innovation normally passes: idea, 
development, exploration, assessment, and long term 
follow-up (also known as stages 1, 2a, 2b, 3, and 4, respec- 
tively). This paper focuses on the stages of assessment 
(specifically in relation to randomised trials) and long term 
follow-up. By the assessment stage, a new intervention 
will have shown early promise and be used increasingly 
by the surgical community; however, the intervention's 
relative benefit compared with alternative approaches will 
be uncertain. At the long term follow-up stage, a surgical 
intervention will need further assessment owing to techni- 
cal refinements or to related devices or procedures being 
brought onto the market. 

Surgical procedures are conducted with an almost infi- 
nite set of subtle variations: surgeon training, team exper- 

Box 1 1 Potential solutions to overcome common variations 
in surgical randomised controlled trials 

Surgeon preferences 

Maxinnise flexibility in the delivery of surgical interventions, 

beyond the key distinctive elements, to allow forvariation in 

surgeon and centre practices 

Implement recruitment of participants by a third party 

Use broad patient eligibility criteria 

Undertake preliminary workto establish consensus regarding 

community uncertainty 

Adopt an expertise based trial design 

Patient preferences 

Undertake a qualitative evaluation of patients' perspectives 
and experiences 

Quality control of the intervention 

Use criteria for surgeon eligibility (for example, training and 
previous number of cases) 

Record an objective measure of quality (for example, lymph 
node yield for gastric cancer surgery) 
Record Indicators of surgical decision making (for example, 
conversion from partial to total knee replacement, or from 
laparoscopic to open surgery) 



tise, personal practice, centre policy and infrastructure, 
anatomical features of the patient, and the use of a variety 
of medical devices. Beyond the procedure, other factors 
are implicitly part of the intervention: the type of anaes- 
thesia used,' preoperative and postoperative management 
(including drug treatments such as aspirin),^ physiother- 
apy,^ and psychological interventions (including verbal 
guidance).* These linked and interdependent components 
produce a complex intervention.^ At the assessment and 
long term stages, this complexity is most apparent and 
challenging for conducting a rigorous and comprehensive 
evaluation of a surgical intervention. 

Another challenge is to measure outcomes compre- 
hensively; surgical studies are also Hmited by their selec- 
tion of outcomes, which are often short term, "operation" 
focussed, and inconsistently defined. Properly conducted 
randomised controlled trials and observational studies 
with agreed and defined core outcomes are needed at these 
critical stages.*" The failure to conduct methodologically rig- 
orous studies has resulted in some surgical interventions 
becoming and remaining standard practice without good 
evidence.' ' Similarly, new medical devices are widely used 
without due assessment.' In this paper, we consider in turn 
the roles of randomised controlled trials in the assessment 
stage and evaluations in the long term stage. 

Randomised controlled trials in the assessment stage 

The role of randomised controlled trials in evaluating 
surgical interventions has been debated over the past 30 
years.' * '° " A consensus in favour of accepting properly 
conducted trials as the "gold standard" for comparisons of 
efficacy and effectiveness between surgical procedures has 
eventually emerged, although not without controversy.* 
While several surgical trials have been successful and 
influential,"' " others have been attempted and failed" or 
have not had the anticipated influence on the adoption of 
the intervention." Even if a trial evaluation is undertaken 
successfully, factors out of the study investigators' control 
(for example, innovations and technological changes) can 
lead to uncertainty about the evaluation's applicability.'' 
The assessment stage provides a window of opportu- 
nity—albeit sometimes a brief one— to obtain definite 
randomised evidence about effectiveness. The IDEAL 
framework proposes that a large multicentre trial is most 
valuable and viable during the assessment stage, although 
small single centre trials might appear as early as IDEAL 
stage 2a. Randomised controlled trials have an array of 
potential problems in evaluating surgical techniques (box 
1),* " " and most stem from three related issues: the inter- 
vention definition, who delivers the intervention, and the 
treatment preferences of surgeons and patients. 
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Surgical trials examples— standardisation of interventions and eligibility criteria of patients and surgeons 





No of 




Standardisation of interventions 




Research question 


centres 


Patient/surgeon eligibility 


Perioperative care 


Preoperative and postoperative care 


Stapler y hand sewn closure after 
distalpancreatectomy (DISPACT)^' 


21 


Broad/training given to all participating 
surgeons (centre had to perform 10 
pancreatic resections peryear) 


Resection and closure procedures standardised; 
stapler specified; type of incision not standardised; 
no additional intraoperative treatmentor covering 
ofthe pancreatic remnant; splenectomy in addition 
to distal pancreatectomy at discretion of surgeon; 
standardised policy in relation to octreotide use at 
sites; pain management notstandardised 


Bowel preparation not standardised; 
standardised drain use; recovery not 
standardised (for example, location, 
feeding, and mobilisation) 


Treatment of tibial fractures with reamed or 
non-reamed intrameduallary nails (SPRINT) 


29 


Broad/no reported restrictions 


Standardised reamed nailing and unreamed nailing 
procedures 


Standardised antibiotic use in open 
and closed fractures; standardised 
mobilisation (both open and closed) 
and use of growrth stimulation not 
allowed during first 12 months; wound 
closure (open); dynamisation ofthe 
nail (both) and reoperation (open) 
only allowed in specific circumstances 


Treatment of abdominalaortic aneurysm (EVAR 
trial 1) with endovascular or open repair" 


34 


Broad/hospitals had to complete 20 
endovascular repair procedures 


Choice of EVAR device left to participating 
surgeons, otherwise no stated restrictions 


No stated restrictions 


Treatment of premenopausal women with 
abnormal uterine bleeding with surgical or 
medical treatment (Ms study) " 


2 


Broad/no stated restrictions 


Type and route of hysterectomy at discretion of 
surgeon; prophylactic oophorectomies were 
discouraged 


No stated restrictions 


Treatmentof osteoarthritis of the knee with 
arthroscopic debridement, arthroscopic lavage, 
or placebo surgery" 


1 


Broad/one surgeon performed all 
operations 


Surgical interventions standardised 


Standardised anaesthetic; mobilisation 
and pain management (analgesics) 



Intervention definition 

How tightly the intervention should be defined will depend 
on the type of comparison (table). Trials investigating the 
auxiliary facets of the intervention are valuable, but stud- 
ies evaluating the surgical core of an innovative procedure 
(whether a new procedure or a modification of an estab- 
lished procedure) are crucial. In a comparison of medical 
versus surgical trials, the definition of surgery can be broad. 
For example, in a trial of medical treatment versus hyster- 
ectomy, the type and route of the hysterectomy was left to 
the discretion of the gynaecologists, as was the medical 
treatment (although there was a suggested regimen).'" In a 
trial comparing open versus laparoscopic repair of inguinal 
hernia, surgeons were allowed to choose the type of open 
and laparoscopic repair." 

If special equipment is needed, the medical device used 
does not typically need to be not restricted. In trials of medi- 
cal devices, or where the related procedures being compared 
are similar, it may be necessary to define each intervention 
precisely and to introduce process control measures to check 
on compUance to preclude contamination and control the 
effect of ancillary care.^° Small changes in technique or tech- 
nology can have a substantial effect on outcomes, as shown 
by recent research relating to metal-on-metal hip devices.' 

Measuring adherence regarding intervention delivery 
has been rare in surgical trials, but can help in interpreting 
the applicability (generahsability) ofthe results. Example 
measures include specimen margin examination or node 
counts in cancer procedures, or taking photographs after 
completion of key parts of the procedure.^' Deciding on 
restrictions requires careful consideration ofthe research 
question and the potential risk of bias and confounders 
(such as associated treatments), although as few restrictions 
as possible is preferable. 

Who sliould deliver the intervention? 

Every operation should be carried out or supervised 
closely by someone with appropriate level of expertise 
and training. Collectively, participating surgeons should 



have sufficient expertise in order for the surgical commu- 
nity to embrace the trial and its findings. The traditional 
approach— where each surgeon delivers both or all surgical 
interventions in the trial— has been criticised. A compari- 
son could be deemed unfair if surgeons have more exper- 
tise in one intervention than another. 

This problem can be managed in two ways. Firstly, trial 
participation can be restricted to surgeons with an accept- 
able level of expertise in both or all surgical interventions. 
Surgeon eligibility criteria have generally focussed on 
markers of training and previous experience ofthe inter- 
vention (for example, completing 10 laparoscopic hernia 
procedures). Professional grade, year of experience, and 
annual caseload can be used as markers, although a more 
rigorous standard of direct demonstration of surgical com- 
petency has also been proposed (for example, providing 
training and supervision before participation).^' Under the 
second approach, participating surgeons deliver only the 
interventions in which they have expertise (an expertise 
based trial). There is limited evidence about how well 
this approach works to date, and such designs are not 
without statistical and practical disadvantages.^' Whatever 
approach is adopted, other factors can lead to differences 
in outcome between surgeons (such as ancillary care and 
centre admission policies) although they are rarely, if ever, 
fully standardised. 

Impact of treatment preferences 

The preferences of both patients and surgeons are a key 
factor that affects the success of a randomised control- 
led trial, and can be the decisive influence upon recruit- 
ment.^'' If patients tend to prefer one of the treatments, 
they are unlikely to agree to be randomised in case they are 
assigned to another treatment. The merit of an otherwise 
well designed and conducted trial can be fatally under- 
mined if too few surgeons are willing to be involved. There 
is, however, a strong relationship between the patient and 
the surgeon, who may have his or her own strong prefer- 
ences and have traditionally acted as gatekeeper and 
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Box 2 1 Example of observational study at the long term study stage' 
Implanted devices for total hiip replacement 

Clinical background at time of conduct 

Total hip replacement is widely undertaken altiiougii revision is sometimes necessary, particularly 
in younger recipients 

Alternative devices with a larger head size and different bearing surface materials (such as metal- 
on-metal devices) have increasingly been used to reduce revisions 
Design 

Long term, device surveillance study nested within a population based registry 
Primary stemmed operations of total hip replacement done between 2003 and 2011 
(n=402 05 1) were linked to revision operations 

Operations involving different types of devices (varying bearing surface and head size) 
Findings 

IVIetal-on-metal devices had poorer survival than devices with alternative surfaces 
Lower device survival found in women, and for those devices with larger heads 

facilitator." Recent evidence^* demonstrated tfiat patients' 
preferences expressed during consultations could be influ- 
enced by surgeon recruiters, who seemed to unconsciously 
transmit their own preferences during the consent process. 
Properly informed patients could be more likely to consent 
to randomisation. 

For multicentre trials, a pragmatic approach to patient 
eligibility that achieves general agreement and under- 
standing is important. Surgeon preferences often depend 
on the patient's prognosis. The Spine Stabilisation Trial^' 
explicitly adopted broad inclusion criteria, using an 
approach to recruitment based on the "uncertainty prin- 
ciple": surgeons could restrict randomisation to eligible 
patients which they personally were uncertain as to which 
intervention would be the best option (known as personal 
equipoise). However, this approach seems to have led to 
misunderstanding among participating surgeons about 
the pragmatic nature of the trial design and the explicit aim 
of seeking to recruit a wide spectrum of potential patients. 
A successful example is the first EVAR trial on endovas- 
cular aneurysm repair versus open repair in patients with 
abdominal aortic aneurysm, which had broad eligibility 
criteria and achieved its recruitment target." Transmission 
of preference can be mitigated if consent is obtained by a 
trained and possibly neutral recruiter, who is not delivering 
any intervention (such as a research nurse). The merits of 
this approach will differ according to the research question. 

Long term study stage 

Although the benefit of a particular surgical intervention 
(for example, knee replacement) might be well established, 
the use of a particular variation in the approach (such as 
using a posterior approach) or device selection is often 
open to question long after widespread adoption. This 
provides an opportunity to obtain good evidence about 
safety and effectiveness of techniques or technologies from 
observational and surveillance studies. Current research 
on long term surveillance focuses mainly on medical 
devices and— in the case of surgery— implantable devices. 
One reason for this focus is the cost; the medical device 
market in 2008 was estimated to exceed £150bn (€177bn; 
$23 2bn) worldwide.^" Surveillance of the long term effect 
of surgical innovation (both in terms of the procedure and 
devices) is imperative even if short term benefit has been 
established (at IDEAL stages 1-3). The US Food and Drug 



Administration (FDA) recently developed a conceptual 
framework for medical devices specifically," although it 
needs to be developed and refined in the context of long 
term surveillance. Box 2 provides an example of a long 
term surveillance study. 

Long term evaluation of procedures 

Well designed, large observational studies (for example, 
based on registries) can be used to evaluate procedures 
in the long term study stage; they can also provide data 
for outcomes in subgroups of interest as well as rare end- 
points in safety and effectiveness.^^ From an assessment 
perspective, some national or nationally representative 
patient registries can be defined as observational studies 
collecting "uniform data, to evaluate specified outcomes 
for a population defined by a particular disease, condi- 
tion, or exposure." Registries can be designed to capture 
data for specific conditions or exposures (such as surgery 
or devices), types of healthcare service delivered (such as 
surgical treatment or diagnostic procedure), or specific 
outcomes (such as an adverse event, disorder, or disease), 
to improve the delivery of care. 

Practical factors often determine which data are col- 
lected, but in principle, disease based registries have the 
advantage of enabling consideration of selection for a 
procedure and potential for associated bias in an evalua- 
tion. Procedure registries can provide useful comparative 
evidence for different interventions and devices. Long- 
standing procedure registries include those developed 
by professional societies such as the United Kingdom's 
Society for Cardiothoractic Surgery's adult cardiac surgery 
database or the Society of Thoracic Surgeons' registry.^^ 
Other studies (including randomised controlled trials) 
can be nested in them. The Swedish national registry of 
gallstone surgery and endoscopic retrograde cholangio- 
pancreatography (GallRiks) enabled a large cohort study 
to quantify survival and incidence of bile duct injuries and 
explore the relation between them.^" 

The choice of surgical procedure or medical device can 
often vary greatly, even for similar patients within and 
between centres. For example, the figure shows the propor- 
tion of operations using abdominal access (versus thoracic 
access) in hiatal hernia repair, in hospitals in the Nation- 
wide Inpatient Sample.^^ The choice of access route was 
strongly influenced by surgeon practice and institutional 
culture, and was unhkely to be related to the hernia loca- 
tion for many hospitals." In most hospitals, the majority of 
hernia repairs were conducted via abdominal access— that 
is, the hernia location did not dictate the approach and 
therefore any confounding by indication will probably be 
limited. Identifying such practice and surgeon patterns 
can, therefore, help clarify the extent to which selection 
of patients for receiving the procedure (or device) may have 
occurred. Exploration of variations in practice can improve 
the design of comparative studies, by providing insight 
regarding the Hkelihood of the main potential bias— con- 
founding by indication.^'' 

The modes of follow-up are critical," as are complete- 
ness and accuracy of data collection, which can lead to 
loss of follow-up and various misclassification biases (for 
example, outcomes of difficult operative cases being att- 
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Hospitals 

Use of the abdominal route (versus thoracic route) as a proportion of operations in hiatal 
hernia repair, by hospital. Data taken from US hospitals in the Nationwide Inpatient Sample'*; 
figure shows 708 hospitals (900 hospitals with 100% abdominal route use not shown). 
Blue=percentage of operations with abdominal route; blacl<=95% exact (binomial) confidence 
intervals. Use ofthe abdominal route varied from 0%to 100% across hospitals 



ributed to revision surgery or a medical device). Standardi- 
sation of terminology, as in earlier stages,^*" would allow 
routine capture of information on surgical procedures such 
as the use of a laparoscopic approach, laterality (side of 
surgery), and device information. Similarly, standardised 
terminology for devices (such as from the Clinical Data 
Interchange Standards Consortium or product catalogues) 
will help to accurately describe the specific attributes and 
properly identify the technology used. 

However, such studies have inherent and common limi- 
tations. Firstly, the concept of "intention to treat" does 
not readily map onto observation data, and only the first 
procedure or devices used is readily interpretable. If a 
patient subsequently receives another procedure that 
converts failure after the first procedure into success, this 
outcome could be misattributed to the first procedure. 
Observational studies (including registry based studies) 
should, whenever possible, construct intention to treat 
analyses that correspond to treatment decisions as they 
occur in the real world. However, key data are often not 
routinely collected. Reasoned inferences can be made 
from the clinical scenario using resource use data. For 
example, if routine data show that both partial and total 
knee devices were used in the same knee during an opera- 
tion, it can be appropriately inferred that it was necessary 
to change a partial device to a total one, as it is impossible 
for the opposite to occur. 

Secondly, the time between cohort entry (assignment 
to procedure) and date of first exposure (actual delivery of 
procedure) is often not recorded. This leads to "immortal 
time bias," a period of follow-up after assignment during 
which outcomes of treatment that determine the end of 
follow-up cannot occur, as the treatment has not yet hap- 
pened. This bias can confound results, because interven- 
tions that are delivered faster could look worse than those 
needing more time (for example, sicker patients might die 
before receiving the therapy and being accounted for). 
Finally, another difficult situation is when the originally 
assigned (intended) treatment switches before initiation. 
This switch can be due to patient refusal, financial fac- 
tors, or other considerations. Again, such information is 
not typically collected by observational data sources but 
is often needed for meaningful interpretation. 



SUMMARY POINTS 

Rigorous evaluation of surgical innovations is needed in the 
assessment and long term study stages, which together meet 
the need for comprehensive outcome assessment 
Randomised trials ofsurgical interventions, alongwith 
observational studies in the long term study stage, should be 
designed which acknowledge the complexity of surgery 
Key issues for surgical trial design are specification ofthe 
interventions, who will deliverthe interventions, and assessing 
the potential impact of patient and surgeon preferences 
Long term evaluations ofthe procedure and any related devices 
is needed, along with the development of data collection and 
methodology for surveillance 

Surveillance of devices 

In the US and UK, manufacturers and importers are 
required to submit reports of device related deaths, seri- 
ous injuries, and malfunctions to the regulatory bodies. 
US hospitals and nursing homes are required to submit 
reports of device related deaths and serious injuries to the 
manufacturer and only deaths to the FDA, but healthcare 
providers and consumers can submit reports voluntarily 
(through MedWatch)." Such passive reporting systems 
typically have important weaknesses, including: 

• Incomplete or inaccurate data that are usually not 
independently verified 

• Data reflecting reporting biases driven by event 
severity or uniqueness, publicity, or litigation 

• Causality cannot be inferred from any individual 
report 

• Events are generally under-reported and this, in 
combination with lack of denominator (exposure) 
data, precludes determination of event incidence or 
prevalence. 

However, reports received through passive and 
enhanced systems are often useful and have resulted in 
important public health alerts related to: 

• Transvaginal placement of surgical mesh 

• Use of recombinant bone morphogenetic protein in 
cervical spine fusion 

• Interactions induced by magnetic resonance 
imaging in patients with implanted neurological 
stimulators.^' 

In addition, the FDA has developed an enhanced sur- 
veillance system using several different modes of sur- 
veillance, including active surveillance. This system, 
known as the Medical Product Safety Network,'* provides 
national surveiUance of medical devices based on a repre- 
sentative subset of user facihties. Routine data coUection 
and monitoring for devices need improvement. Finally, 
when resources are available, active surveillance based 
on registries can also help monitor high risk surgery and 
devices, such as a national registry of implanted ventricu- 
lar assisted devices." 

Summary 

A large, multicentre, randomised controlled trial in the 
assessment stage complements observational evalua- 
tion in the long term study stage. Large and preferably 
national patient registries are best suited for long term 
surveillance studies of surgical procedures. Surveillance 
of devices with improved data collection is needed. 
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Owing to the inherent complexity of surgery and varia- 
tion in practice, both randomised controlled trials and 
surveillance studies face particular challenges. However, 
solutions are often available, and such difficulties should 
not prevent rigorous and comprehensive evaluation of 
surgical innovations. 
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